Allele identification in assembled genomic sequence datasets.

نویسندگان

  • Katrina M Dlugosch
  • Aurélie Bonin
چکیده

Allelic variation within species provides fundamental insights into the evolution and ecology of organisms, and information about this variation is becoming increasingly available in sequence datasets of multiple and/or outbred individuals. Unfortunately, identifying true allelic variants poses a number of challenges, given the presence of both sequencing errors and alleles from other closely related loci. We outline the key considerations involved in this process, including assessing the accuracy of allele resolution in sequence assembly, clustering of alleles within and among individuals, and identifying clusters that are most likely to correspond to true allelic variants of a single locus. Our focus is particularly on the case where alleles must be identified without a fully resolved reference genome, and where sequence depth information cannot be used to infer the putative number of loci sharing a sequence, such as in transcriptome or post-assembly datasets. Throughout, we provide information about publicly available tools to aid allele identification in such cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allele Identification for Transcriptome-Based Population Genomics in the Invasive Plant Centaurea solstitialis

Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 11-430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea sol...

متن کامل

VirSorter: mining viral signal from microbial genomic data

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the d...

متن کامل

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

Evaluation of anaerobic pathogens in periodontitis patients and its relationship with TGF-1β genomic polymorphism by Tetra Arms-PCR method

Background and Aims: Periodontitis is a common and inflammatory infectious disease that causes damage to the tissues supporting the tooth and consequent tooth loss. Periodontal disease is a multimicrobial and multifactorial disease and important anaerobic bacteria are involved in periodontal infection. TGF-1β is one of the growth factors and anti-inflammatory cytokines that play a crucial role ...

متن کامل

Defining Cellulase in the Glycosyl Hydrolase Family 48 Sequence, Structure, and Evolution of Cellulases in the Glycoside Hydrolase Family 48

Background: Cellulases are non-homologous isofunctional enzymes, which prevents their unambiguous identification in genomic datasets. Results: Cellulases from glycoside hydrolase family 48 have distinct evolutionarily conserved sequence and structural features. Conclusion: Conserved sequence/structure features can be used to differentiate cellulases from non-cellulases in genomic datasets. Sign...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Methods in molecular biology

دوره 888  شماره 

صفحات  -

تاریخ انتشار 2012